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Abstract. We focus on credal nets, which are graphical models that generalise Bayesian 
nets to imprecise probability. We replace the notion of strong independence commonly 
used in credal nets with the weaker notion of epistemic iirelevance, which is arguably more 
suited for a behavioural theory of probability. Focusing on directed trees, we show how to 
combine the given local uncertainty models in the nodes of the graph into a global model, 
and we use this to construct and justify an exact message-passing algorithm that computes 
updated beliefs for a variable in the tree. The algorithm, which is linear in the number of 
nodes, is formulated entirely in terms of coherent lower previsions, and is shown to satisfy 
a number of rationality requirements. We supply examples of the algorithm's operation, 
and report an application to on-line character recognition that illustrates the advantages of 
our approach for prediction. We comment on the perspectives, opened by the availability, 
for the first time, of a truly efficient algorithm based on epistemic irrelevance. 



1. Introduction 

The last twenty years have witnessed a rapid growth of graphical models in the fields 
of artificial intelligence and statistics. These models combine graphs and probability to 
address complex multivariate problems in a variety of domains, such as medicine, finance, 
risk analysis, defence, and environment, to name just a few. 

Much has been done also on the front of imprecise probability. In particular, credal nets 
[4] have been and still are the subject of intense research. A credal net creates a global 
model of a domain by combining local uncertainty models using some notion of independ- 
ence, and then uses this to do inference. The local models represent uncertainty by closed 
convex sets of probabilities, also called credal sets. 

The notion of independence used with credal nets in the vast majority of cases is that 
of strong independence (with some exceptions in [8]). Loosely speaking, two variables 
X,Y are strongly independent if the credal set for {X,Y) can be regarded as originating 
from a number of precise models in each of which X and Y are stochastically independent. 
Strong independence is closely related to the sensitivity analysis interpretation of credal 
sets, which regards an imprecise model as arising out of partial ignorance of a precise one. 

In the particular case of credal nets, strong independence leads to a mathematical equi- 
valence: a credal net model is equivalent to a model consisting of a set of Bayesian nets, 
each with the same graph but with different values for the parameters. The sensitivity ana- 
lysis interpretation is then that there is some (kind of ideal) Bayesian net model of the 
problem under consideration, and the graph of such a net is known. But, for some reason, 
the net's parameters are not known precisely, and that is why one considers the set of all the 
Bayesian nets that are consistent with the partial specification of the parameters. Common 
causes for the existence of partial knowledge are the cost of, and time constraints on, eli- 
citing parameters, and disagreement amongst a group of experts consulted for that purpose. 
Non-ignorable missing data can be another reason, in case the parameters are inferred from 
a data set [29]. 
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The sensitivity analysis interpretation of imprecise-probability models, and hence strong 
independence, is not always applicable. A notable case arises when one wishes to model an 
expert's beliefs: it is then not always tenable that there should be some ideal Bayesian net 
that models these behefs, and that it is only because of our limited resources that we cannot 
define it precisely. Rather, it seems more reasonable to concede that expert knowledge may 
be inherently imprecise to some extent.' This simple observation makes the sensitivity 
analysis interpretation fail, and hence it makes strong independence an inadequate model, 
in general, for such a situation.^ 

An alternative and attractive approach to expressing irrelevance that is not committed 
to the sensitivity analysis interpretation is offered by epistemic irrelevance [24]: we say 
that X is epistemically irrelevant to Y if observing X does not affect our beliefs about Y. In 
other words, by making an epistemic irrelevance assessment, a subject states that her belief 
model about Y does (or will) not change after receiving information about X. When the 
belief model is a precise probability, both epistemic irrelevance and strong independence 
reduce to the usual (stochastic) independence.^ But when the model is a set of probab- 
ilities, this is no longer the case, because in contradistinction with strong independence, 
epistemic irrelevance is a property of this set that cannot be explained using properties of 
the precise probabilities in the set. Epistemic irrelevance is defined directly in terms of a 
subject's belief model (the set of probabilities). For this reason, it is very well suited for 
a behavioural theory of imprecise probabiUty. Contrary to strong independence, it is not 
a symmetrical notion: generally speaking, the epistemic irrelevance of X to F does not 
entail the epistemic irrelevance of F to X. It is also weaker than strong independence, in 
the sense that strong independence implies epistemic irrelevance: sets of probabiUties that 
correspond to assessments of epistemic irrelevance usually include those related to strong 
independence assessments. It therefore does not lead to overconfident inferences when the 
sensitivity analysis interpretation is not justified. 

At this point, the question we address in this paper should be clear: can we define credal 
nets based on epistemic irrelevance, and moreover create an exact algorithm to perform 
efficient inferences with them? We give a fully positive answer to this question in the 
special case that (i) the graph under consideration is a directed tree, and (ii) the related 
variables assume finitely many values. The intuitions that showed us the way towards this 
result originated in previous work done by some of us on imprecise probability trees [9] 
and imprecise Markov chains [10]. 

How do we address this problem? 

In Section 2, we discuss some preliminary graph-theoretic notions, and define the local 
uncertainty models that will be used at each node of a tree. These models are formalised 
through the language of coherent lower previsions [24]. We discuss how such local models 
will give rise to a global uncertainty model, which plays the same role as the joint mass 
function built by the chain rule in a Bayesian net. Based on the global model, we state 
the Markov condition that defines the imprecise-probability interpretation of our credal 
trees. As announced before, this Markov condition involves epistemic irrelevance rather 
than strong independence. 

In Section 3, we take a brief detour to discuss in general terms how to combine mar- 
ginal models into joint ones using irrelevance assessments, in a way that is as conservative 
as possible. We do so because the notion of so-called epistemic independence, which arises 
out of a symmetrisation of epistemic irrelevance, has so far been defined in the literature 
only for the case of two variables. We define and discuss the independent natural extension 



For a detailed argumentation and exposition of this point of view, we refer to [24, Chapter 5]. 

Obviously, there will be special cases where strong independence is justified in order to model an expert's 
knowledge. Moreover, strong independence could provide a good approximation to more accurate models, even 
when it is not entirely appropriate. This is something that seems to deserve further investigation. 
% we ignore issues related to events with probabiUty zero. 
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of a number of marginals. This is the most conservative joint model that arises out of the 
marginals and epistemic independence alone. Moreover, we show that the independent nat- 
ural extension has a very important strong factorisation property, which has a crucial part 
in our algorithm for updating credal trees under epistemic irrelevance. 

In Section 4, we turn to the problem of constructing the most conservative global model 
based only on the local models in the tree and our Markov condition. We show that this task 
can be achieved by a recursive construction that proceeds from the leaves to the root of the 
tree using two operations: the independent natural extension discussed in Section 3, and 
the marginal extension, defined and studied in [24, 17]. We also show that all uncertainty 
models we consider, the local ones as well as the global ones that we create, satisfy a con- 
sistency criterion that generalises (and is based on the same ideas as) the usual consistency 
criterion in Bayesian nets: they are (separately and jointly) coherent [24, 15, 16, 27] (see 
in particular [18, Section 8.1]). This is an important rationality requirement. 

We briefly comment on some of the graphical separation criteria induced by epistemic 
irrelevance in Section 5. We then go on to develop and justify an algorithm for making 
inferences on credal trees under epistemic irrelevance in Section 6. The algorithm is used to 
update the tree: it computes posterior beliefs about a target variable in the tree conditional 
on the observation of other variables, which are called instantiated, meaning that their value 
is determined. It can in particular be used for treating the model as an expert system. 

Our algorithm is based on message passing, as are the traditional algorithms that have 
been developed for precise graphical models. It has some remarkable properties: (i) it 
works in time linear in the number of nodes in the tree; (ii) it natively computes posterior 
lower and upper previsions (or expectations) rather than probabilities; (iii) it is the first al- 
gorithm developed for credal nets that exclusively uses the formalism of coherent lower 
previsions; and (iv) it is shown that, under very mild conditions, using the tree for updating 
beliefs cannot lead to inferences that are inconsistent with the local models we have started 
from, nor with one another. 

We give a step-by-step example of the way inferences can be done using our algorithm 
in Section 7. We also comment there on the intriguing relationship between the failure of 
certain classical separation properties in our framework, and dilation [14, 22]. 

The last part of the paper focuses on numerical simulations. In Section 8 we empirically 
measure the amount of imprecision introduced by using epistemic irrelevance rather than 
strong independence in a credal tree, when propagating inferences backwards (towards the 
root) from instantiated nodes to the target node. Indeed, it can be shown [9] that there is no 
difference between inferences that go forward from instantiated nodes to the target node un- 
der strong independence and epistemic irrelevance. In Section 9 we present an application 
of our algorithm to on-line character recognition. We learn the probabilities from data and 
compare the predictions of our approach with those of its precise probability counterpart. 
The results are encouraging: they show that the tree can be used for real apphcations, and 
that the imprecision it originates is justified. 

In order to keep this paper reasonably short, we have to assume the reader has a good 
working knowledge of the basics of Peter Walley's [24] theory of coherent lower previsions. 
This is needed in particular for the most important proofs, collected in the Appendix. For a 
fairly detailed discussion of the coherence notions and results needed in the context of this 
paper, we refer to recent work by Enrique Miranda [15, 16]. 

2. Credal trees under epistemic irrelevance 

2.1. Basic notions and notation. We consider a rooted and directed discrete tree with 
finite width and depth. We call T the set of its nodes s, and we denote the root, or initial, 
node by □. For any node s, we denote its mother node by m{s) . Of course, □ has no mother 
node, and we use the convention m(n) = 0. Also, for each node s, we denote the set of its 
children by C{s), and the set of its siblings by S{s). Clearly, >S'(n) = 0, and if i 7^ □ then 
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S{s) = C{m{s)) \ {s}. If C{s) = 0, then we call s a leaf, or terminal node. We denote by 
T^:={s&T: C{s) ^ 0} the set of all non-terminal nodes. 

For nodes s and t, we write s C f if s precedes t, i.e., if there is a directed segment in the 
tree from slot. The relation □ is a special partial order on the set T. A(s) := {r G T : tn s} 
denotes the chain of ancestors of s, and D{s) := {t : s\Zt}iis set of descendants. Here 
s \Zt means that C f and s ^t. We also use the notation '\s := A{s) U {.?} for the chain 
(segment) connecting □ and s, and \.s := D(s) U {s} for the sub-tree with root s. Similarly, 
we let t^' := [j{^s: 5 G 5} and \S := [}{is: s&S] for any subset S<ZT. For any node s, 
its set of non-parent non-descendants is given by s := T \ ({m(i)} U \-s). 

With each node s of the tree, there is associated a variable assuming values in a 
non-empty finite set We denote by ^(^) the set of all real- valued maps (also called 
gambles) on We extend this notation to more complicated situations as follows. If S is 
any subset of T, then we denote by Xg the tuple of variables whose components are the Xg 
for all i G 5. This new joint variable assumes values in the finite set 3K's :=Xses^s> and the 
corresponding set of gambles is denoted by ^(.£5). ^ Generic elements of are denoted 
by Xs or Zs. Similarly for xs and zs in ^s- Also, if we mention a tuple zs, then for any f G 5, 
the corresponding element in the tuple will be denoted by zt- We assume all variables in 
the tree to be logically independent, meaning that the variable may assume all values in 
^s, for all C 5 C r. 

We will frequently use the simplifying device of identifying a gamble fs on ^ with its 
cylindrical extension to where S C U C T . This is the gamble fu on .STy defined by 
fu(xu) '■— fsi^s) for all xu G ^u- To give an example, if J(f C ^( J^^t-), this trick allows 
us to consider J(f n^{^s) the set of those gambles in J(f that depend only on the 
variable Xg. As another example, this device allows us to identify the gambles ^^s} ^"'l 
IfejxXj-^^' Slid therefore also the events {xs} and {xs} x ^t\s- More generally, for any 
event A C we can identify the gambles Ia and I^x jfj-^^, and therefore also the events 
A and A x JTj-y^. In the same spirit, a lower prevision on all gambles in Jffi^s) can be 
identified with a lower prevision defined on the set of corresponding gambles on a 
subset of .i^(S'r). 

Throughout the paper, we consider (conditional) lower previsions as models for a sub- 
ject's beliefs about the values that certain variables in the tree may assume. We use a 
systematic notation for such (conditional) lower previsions. Let /, O C 7" be disjoint sets of 
nodes with O 7^ 0, then we generically^ denote by V_(){-\Xi) a conditional lower prevision, 
defined on the set of gambles ^(^uo)-^ For every gamble/ on and every x/ G 
V_0{f\x/) is the lower prevision (or lower expectation, or a subject's supremum buying 
price) for/of the gamble /, conditional on the event that Xj = xj. We interpret V_Q{f\Xj) 
as a real- valued map (gamble) on ^/ that assumes the value in the element xj 

of The conjugate conditional upper prevision Vo(-|X/) is defined on .if (^uo) by 
Vo{f\X,) := -£o(-/l^/) for all gambles / on .r/uo- 

We will always implicitly assume that all conditional models V_o{'\^i) ^^e sep- 

arately coherent, meaning that: 

SCI. Vj){f\xi) >min2^g^^/(x/,zo)forall/G^(J^uo) andallx/ G ^ [accepting par- 
tial gains]; 



For any subset SofT,X's is defined formally as the set of all maps X5 of S to \Jses ^s, such that xs{s) = 
Xs £ ,9^^ for all .v 6 S. So when S = 0, the empty product Sf^m is defined as the set of all maps from to 0, which 
is a singleton. The corresponding variable X% can then only assume this single value, so there is no uncertainty 
about it. J^( ,9^ii^) can be identified with the set R of real numbers. 
^Besides the letter V , we will also use the letters P, Q and R. 

6ln keeping with the observation in footnote 4, we also allow / = 0, which means conditioning on the vari- 
able X/ = X0, which can only assume one single value. This means that Vjq{-\Xi^) —. Vq effectively becomes an 
unconditional lower prevision on ^{^oun) = .Sf{^o)- This a very useful device that allows us to use the same 
generic notation for both conditional and unconditional lower previsions. 
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SC2. Vo{h +/2k/) > \xi)+V^{f2\xi) for all hJi £ ^{^luo) and all xi e ^/ 
[super-additivity] ; 

SC3. Vo(A/|x/) = AVoC/lx/) for all / G ^( JT^uo), all non-negative real X and all xi G 

^ [non-negative homogeneity]. 
By combining SC1-SC3, it follows that for all / G ^{3^iyjo), xi G and zo G 
min f{xi,zo)<Vo{f\xi)<Vo{f\xi) < max f{xi,zo)- 

If we let / be the indicator I^^j of the set {zi} in these inequahties, they reduce to the 
following, intuitively obvious, property:' 

SC4. Voiizi} X ^o\xi) =Vo{{zi} X ^o\xi)^l{„}izi) forallx,,z/ G 
From SCI, SC2 and SC4 we can also derive that, with obvious notations: 

]Lo{f\xi) =y_o{l{x,}f\xi) =£o(/(-*^/r )!-«/) for all gambles / on JT/uo andxj e JT/, (1) 

where f{xi,-) is a partial map defined on ^o- This implies that V_q{-\Xj) is completely 
determined by its behaviour on (cyUndrical extensions of maps in) Jf{^o)- 

Hereafter, we will frequently introduce conditional lower previsions of the type V_q {-IXi) 
as if they are defined on ^(^o), simply because that is a very natural thing to do: such a 
conditional lower prevision is usually interpreted as representing beliefs about the variable 
Xo, conditional on values of the variable Xj. But the reader should keep in mind that, by the 
separate coherence property (1), V_q{-\Xi) can (and should) always be uniquely extended 
to the larger domain ^(=^uo)- 

As soon as we consider a number of such conditional lower previsions V_q^{-\Xj^), k = 
they should satisfy more stringent consistency criteria than that each of them 
should be separately coherent: they should also be consistent with one another in the sense 
of Walley's (joint) coherence [24, Section 7. 1 .4(b)] . For more details about this much more 
involved type of coherence, we refer also to [15, 16]. 

Finally, let us introduce one of the most important concepts for this paper, that of epi- 
stemic irrelevance. We describe the case of conditional irrelevance, as the unconditional 
version of epistemic irrelevance can easily be recovered as a special case.* 

Consider three disjoint subsets C, /, and O of A'^, where both I and O are non-empty. 
When a subject judges Xj to be epistemically irrelevant to Xq conditional on Xc, he as- 
sumes that if he knows the value of Xc, then learning in addition which value Xj assumes 
in will not affect his beUefs about X^. More formally, assume that a subject has a sep- 
arately coherent conditional lower prevision y_o(A^c) on JsCi/^'o). If he assesses X; to be 
epistemically irrelevant to Xq conditional on Xc, this implies that he can infer from his 
model y(j(-|Xc) a conditional model y^C'l-^cu/) on^{3K'o) given by 

VoiAxoji) ■■= Y^{f\xc) for all / G =Sf (^o) and all xom e Jcu/. 

2.2. Local uncertainty models. We now add a local uncertainty model to each of the 
nodes s. If s is not the root node, i.e. has a mother m{s), then this local model is a (separ- 
ately coherent) conditional lower prevision 2j( 1-^(11(1)) on^(^): for each possible value 
Zm{s) of the variable associated with its mother m{s), we have a coherent lower pre- 
vision Q {■\zm{s)) for the value of X^, conditional on = Zm{s) - 1° the root, we have an 
unconditional local uncertainty model for the value of Xq. is a (separately) coher- 
ent lower prevision on We use the common generic notation Q (•|X;„(,)) for all 
these local models.^ 

^For any event A C ,3'^u(3> we denote Y_o(J^a\^i) also as V_q{A\xi) and call this real number the (conditional) 
lower probability of A. Similarly Vo{A\xi) := Vf;(Iyi|x;) is the (conditional) upper probability of A. 

Q 

It suffices, in the discussion below, to let C = 0. As we indicated in footnote 4, this makes sure the variable 
Xc has only one possible value, so conditioning on that variable amounts to not conditioning at all. 

^We can do this because = Xm has only one possible value, so conditioning on that variable amounts to 
not conditioning at all. 
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2.3. Global uncertainty models. We intend to show in Section 4 how all these local mod- 
els Q can be combined into global uncertainty models. We generically denote 
such global models using the letter P. More specifically, we want to end up with an uncon- 
ditional joint lower prevision P_ := P^^j = Pf on ^(^r) for all variables in the tree, as 
well as conditional lower previsions P^gi' \^s) on ££{2^^ for all non-terminal nodes s and 
all non-empty S C C(s). 

Ideally, we want these global { conditional) lower previsions ( i) to be compatible with 
the local assessments Q s €T, (ii) to be coherent with one another, and (Hi) to 

reflect the conditional irrelevancies ( or Markov-type conditions) that we want the graphical 
structure of the tree to encode. In addition, we want them (iv) to be as conservative (small) 
as possible. 

In this list, the only item that needs more explanation concerns the Markov-type condi- 
tions that the tree structure encodes. This is what we turn to now. 

2.4. The interpretation of the graphical model. In classical Bayesian nets, the graphical 
structure is taken to represent the following assessments: for any node s, conditional on its 
parent variables, its non-parent non-descendant variables are epistemicaUy irrelevant to it 
(and therefore also independent). 

In the present context, we assume that the tree structure embodies the following condi- 
tional irrelevance assessment, which turns out to be equivalent with the conditional inde- 
pendence assessment above in the special case of a Bayesian tree. 

CI. Consider any node s in the tree, any subset 5 of its set of children C{s), and the set 
S := rites^ of their common non-parent non-descendants. Then conditional on the 
mother variable Xg, the non-parent non-descendant variables are assumed to be 
epistemicaUy irrelevant to the variables X^s associated with the children in S and their 

descendants. 

This interpretation turns the tree into a credal tree under epistemic irrelevance, and we 
also introduce the term imprecise Markov tree (IMT) for it. For the global models we are 
considering here, CI has the following consequences. It implies that for all s G T'^, all non- 
empty S C C{s) and all / C 5, we can infer fiomP^{-\Xs) amodelP|^(-|Z{jjy/), where for 

all Z{s}ui S >^{.s}u/' with obvious notations:"^ 

Pisifhs}ui) ■=Pisifi;Zi)\zs) for all gambles/ in ^(^;su/), (2) 

where f{-,zi) denotes a partial map of /, defined on 

We discuss some of the separation properties that accompany this interpretation in Sec- 
tion 5. For now, we focus on two immediate consequences that will help us go from local 
to global models in Section 4. 

First, consider some node s. Then CI tells us that for any two children cj , C2 € C{s) of s, 
the variable X^^i is epistemicaUy irrelevant to the variable X^C2' conditional on Xg. 

/[\ 

X^ci ■ ■ ■ x^^^ 

It even tells us that for any two disjoint non-empty sets C C{s) and 52 C C{s) of children 
of s, the variable X^^i is epistemicaUy irrelevant to ^452' conditional on X^. We conclude 
that, conditional on a node, all its children c (and the variables associated with their sub- 
trees 4,c) are epistemicaUy independent [24, Chapter 9], in the specific sense to be discussed 
in the next section. 

Next, consider some non-terminal node s different from □, and its mother variable . 
We infer from CI that this mother variable is epistemicaUy irrelevant to the variable 
■^ic(i) conditional on X,: 



"For leaves s, the corresponding irrelevance condition is trivial, as the set C{s) of children of s is empty. 
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Xs or equivalently, 

/1\ 

3. Independent natural extension 

Let us make a small digression on epistemic independence, which will help us in our 
discussion further on. The material in this section is based on work that some of us have 
published elsewhere [11], and we refer to that paper for more details and proofs for the 
results mentioned in this section. 

3.1. Independent products. Suppose we have a number of (separately) coherent mar- 
ginal lower previsions on ^(^) representing beliefs about the values that each of 
a finite number of (logically independent) variables X„ assume in the respective non-empty 
finite sets 3^n,n&N, where A'^ is some non-empty finite set. 

We want to construct ajoint lower prevision on ^(,^), where <^ = x„gA?^, that 
coincides with the marginals P„ on their respective domains ^(^), and such that this £^ 
reflects the following structural assessments: for any disjoint proper subsets O and / of A'^, 
the variables X/ are epistemically irrelevant to the variables Xq- In other words, learning 
the value of any number of these variables would not affect beliefs about the remaining 
variables. We then call the variables X„, n gN, epistemically independent. 

Generally speaking, such irrelevance assessments are useful because they allow us to 
turn unconditional into conditional lower previsions. In particular, for any disjoint proper 
subsets O and / of A'^, we can use the epistemic irrelevance assessment of X/ to Xq to infer 
from the joint lower prevision a conditional lower prevision on ^{SK'oiji) 

given by: 

Px){h\zi) '.= Pji{h{-,zi)) for all gambles h on i^oui and all zi G ^i- 

So we can use the synnmetrised assessment of epistemic independence of the variables X„, 
nGN to infer from the following family of conditional lower previsions: 

•^{En) •= {Po(:\X^i) '■ O and / disjoint proper subsets of N}. 

This idea leads to the definition of an independent product, which generalises the existing 
notion for (precise) probability models. 

Definition 1. A (separately) coherent lower prevision Pj^ on I£(^2^f^) that coincides with 
the marginal lower previsions on their domains ^i^S^n), n ^ N and that is coherent 
with the family of conditional lower previsions J'iPff) is called an independent product^ ^ 

of these marginals P„. 

It turns out that there always is a point- wise smallest independent product: 

Proposition 1. Any collection of (separately) coherent lower previsions P„ on .if(.^), 
n€N, has a point-wise smallest independent product. We call it their independent natural 
extension and denote it by ^neNEn- Moreover, ^neNEn is a strongly factorising coherent 
lower prevision on ^{^n). 

Strong factorisation is strongly linked with independent products, and will play a crucial 
part in our development of an algorithm for updating an imprecise Markov tree in Section 6. 
It is defined as foUows: 

' 'in [1 1], we distinguish between many-to-many and many-to-one independent products. It is not necessary to 
make this distinction here, but whenever we use the term 'independent product' in the present paper, we impUcitly 
refer to the more stringent many-to-many version introduced there. 



X. 



m{s) 



X, 



X 



-iC{s) 
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Definition 2. We call a (separately) coherent lower prevision Pf^ on ^{3yt^) strongly 
factorising if for all disjoint proper subsets O and I ofN, all g G ^{^o) and all non- 
negative f G ^( JT/), P^{fg) = PMPn{8))- 

As another important example, the so-called strong product XneNEn [4] of marginal lower 
previsions P„ is strongly factorising.'^ 

As a consequence of the separate coherence of the joint lower prevision P^y, the right- 
hand side of the equaUty in this definition can be rewritten as: 

-'^^-'^'^^ - \P.if)PM if£.U)<o. 

which explains where the term 'factorising' comes from. In particular, for any (separately) 
coherent strongly factorising joint lower prevision P^f, we see that for any partition Ni, 
A/m of iV: 

m m 

£iv(x?=i^*) = U^NiAk) andP;v(xr=iA*) = n^^('^*)' (3) 

k=l k=l 

where Ak C for A: = 1 , . . . ,ot. 

The independent natural extension has very interesting and non-trivial marginalisation 
and associativity properties. Consider any non-empty subset R of A', then the independ- 
ent natural extension ®reRP.r of the marginals P^, r gR coincides with the restriction of 

®neNPn to the set of gambles J^{.!^r): 

{®r€RPr) {g) = {®neNPn) {g) for all gambles g on ^r. (4) 
Moreover, for any partition A^i and N2 of A^, we have that 

®neNPn = (®«iGiVi£«,) ® {<^n2eN2P„2) , (5) 

SO ®neNPn is the independent natural extension of its .^atj -marginal ^meNiPn^ and its 
^2 -marginal ^meNiE-m- 

3.2. Regular extension. As a next step, suppose we want to condition a separately coher- 
ent and strongly factorising joint £^ on observations of the type Xi = zi, where / is some 
proper subset of A^. In other words, we want to find conditional lower previsions Pq{-\Xi) 
on ^(^uo) that are (jointly) coherent with the joint lower prevision P^. To this end, we 
calculate the so-called regular extension as follows. Consider z/ in ^j. Whsn P]^{{zi}) > 0, 

R{h\zi) := max{Ai gM: > 0}, 

where O is any non-empty subset of N\I and h is any gamble on J^uo- When Pn{{zi}) = 
0, R{- \x[) is vacuous, meaning that R{h\zi) = min;coG^o h{zi,xo) for all gambles h on =^uo- 

Generally speaking, coherence only determines Po{-\zi) uniquely if Pj>f{{zi}) > 0, and 
in that case regular extension yields this uniquely coherent conditional lower prevision: 
Po{'\zi) — Ki'lzi). When Pf^{{zi}) = 0, regular extension is still coherent, and it even still 
characterises the coherent because these all lie between the vacuous lower previ- 

sion and R{- \zi). For more details about this regular extension, we refer to [24, Appendix J] 
and [16, Section 4]. 

If the joint Pf/ is strongly factorising, we get: 

M^ih - M]) = PN{hx,}P^{h{xi, •) - M)) 

^ iENi{xi})[PNiKx,,-))-^l] ifP!,{h{xi,-))>H 

\PN{{xi})\Pj,{h{xi,-))-n] ifPj,{h{xi,-))<n, 
so we conclude that, quite interestingly, 

R{h\xi) = Pj^{h{xi, •)) as soon as Pi<f{{xi}) > 0. (6) 



type of independent product comes to the fore in a study of credal nets under strong independence. 
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In Other words, the conditional lower previsions found by regular extension of a strongly 
factorising joint satisfy all epistemic irrelevance conditions present in an assessment of 
epistemic independence. We shall have occasion to use this idea several times in the course 

of this paper, especially in the proofs. 

3.3. Conditionally independent products. To end this section, we generalise the notion 
of an independent product to that of a conditionally independent product. In this case we 
have a number of 'marginal' conditional lower previsions P„{-\Y) on ^(^,) representing 
beliefs (conditional on a variable F in a finite set iV) about the values that each of a finite 
number of (logically independent) variables X„ assume in the respective non-empty finite 
sets n <EN. 

We want to construct a conditional lower prevision on ^{3^^), where = 

XneN^n, that coincides with the marginal conditional lower previsions Pn{-\Y) on their 
respective domains ^(^,), and such that this Pn{-\Y) reflects the following structural 
assessments: for any disjoint proper subsets O and / of A', the variables Xj are epistemically 
irrelevant to the variables Xq, conditional on 7 . In other words, if the value of Y was known, 
then learning the value of any number of these variables would not affect beliefs about 
the remaining variables. We then call the variables X„, n ^ N epistemically independent, 
conditional on Y. 

Generally speaking, such conditional irrelevance assessments are useful because they 
allow us to turn lower previsions conditional on Y alone into other, more involved condi- 
tional lower previsions. In particular, for any disjoint proper subsets O and / of A'^, we can 
use the epistemic irrelevance assessment of Xj to Xq conditional on Y to infer from the 
joint lower prevision Pj^{-\Y) a conditional lower prevision P.o{-\Xi,Y) on ^{S^qui) [or 
equivalently on ^{^oui x ^)] given by: 

Eo{h\zi,y) := P.N{h{- ,zi)\y) for all gambles h on ^oui and all zi e 

So we can use the symmetrised assessment of epistemic independence of the variables X„, 
n&N conditional on Y to infer from the Pjy (• |F) the following family of conditional lower 
previsions: 

-^(Zjv(-li')) — {P^{-\^i,Y)-- O and /disjoint proper subsets of AT}. 
This idea leads to the definition of a conditionally independent product. 

Definition 3. A (separately) coherent conditional lower prevision Pj^{-\Y) on 
that coincides with the 'marginal' conditional lower previsions P„{-\Y) on their domains 
^{^„), n€N and that is coherent with the family of conditional lower previsions J^{Pj^{- \Y)) 
is called a conditionally independent product of these marginals Pn{-\Y). 

It turns out that there always is a point- wise smallest conditionally independent product: 

Proposition 2. Any collection of (separately) coherent conditional lower previsions P„{-\Y) 
on ^(I^n), n E N, has a point-wise smallest conditionally independent product. We call it 
their conditionally independent natural extension and denote it by ®n£NPn{'\Y). 

The notation we use for the conditionally independent natural extension is appropriately 
suggestive: for each y in ®neNPn{-\y) is indeed the independent natural extension of 
the marginal lower previsions P_„{-\y)- This implies that each ®n€NPji{'\y) is a strongly 
factorising coherent lower prevision on 

We are now ready to go back to our discussion of imprecise Markov trees. 

4. Constructing the most conservative joint 

Let us show how to construct specific global models for the variables in the tree, and 
argue that these are the most conservative coherent models that extend the local models and 
express all conditional irrelevancies (2), encoded in the imprecise Markov tree. In Section 6, 
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we will use these global models to construct and justify an algorithm for updating the 
imprecise Markov tree. 

The crucial step lies in the recognition that any tree can be constructed recursively from 
the leaves up to the root, by using basic building blocks of the following type: 



Xs >Q,{-\x^(s)) 



1 



Xxci ^4.02 ••• ^;c„ > P\cM^^) 

The global models are then also constructed recursively, following the same pattern. In 
what follows, we first derive the recursion equations for these global models in a heuristic 
maimer. The real justification for using the global models thus derived is then given in 
Theorem 5. 

Consider a node s and suppose that, in each of its children c G C(s), we already have a 
global conditional lower prevision P_^{-\Xs) on ^( JTj^c) [or equivalently, on ^(=^{s}uj,c)]- 

Given that, conditional on X^, the variables Xj^^, c £ C{s) are epistemically independ- 
ent [see Section 2.4, condition CI], the discussion in Section 3 leads us to combine the 
'marginals' P^c{-\Xs), c € C{s) into their point-wise smallest conditionally independent 
product (conditionally independent natural extension) ®cec{s)P.ic{'\X.s), which is a condi- 
tional lower prevision £|c(i) ('1^*) ^{^\,c{s)) [or equivalently, on ^(^^)]: 

1 

1 

X^C{s) > ®c^C{s)P\c^-\Xs) =:1^(s)'k-\Xs) 

Next, we need to combine the conditional models 2^.(-|X,„(j)) and P;c(v)('l^.s) into a 
global conditional model about X^. Given that, conditional on Xs, the variable X^^^-^ is 
epistemically irrelevant to the variable .y|c(.v) [see Section 2.4, condition CI], we expect 
i4C(.v) ('l^{m(s),.s}) ^nd P|(-(5)(-|Xs) to coincide [this is a special instance of Eq. (2)]. The 
most conservative (point-wise smallest) coherent way of combining the conditional lower 
previsions P^(s)^'\^{m{s),s\) ^^d Qjy\X^(s)) consists in taking their marginal extension^^ 
£(£;c(.)(-|^{m(.),.})l^m(.)) =a2iC(.)(-|^.)l^m(.)); See [17, 24] for more details. Graph- 
ically: 

1 

Xls > Qs{PlC{s){-\Xs)\X^(s)) ='■ £|.s(-|^m(s)) 

Summarising, and also accounting for the case 5 = □, we can construct a global condi- 
tional lower prevision on ^{^^) by backwards recursion: 

PiC{s)m):=^ceCis)Picm) (7) 

for all i e 7"^ . If we start with the 'boundary conditions' 

Plti-\^m{t)) := g(-|^m(0) for all leaves t, (9) 



1 -2 

Marginal extension is, in the special case of precise probability models, also known as the law of total 
probabihty, or the law or iterated expectations. 
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then the recursion relations (7) and (8) eventually lead to the global joint model = 
(• |Xm(n)), and to the global conditional models £;c(s) (' \^s) for all non-terminal nodes s. 
For any subset S C C{s), the global conditional model P^si' ^'^^ ^^en be defined simply 
as the restriction of the model Picis) ■^(=^c(.s)) to the set ^{^^): 

PlsislXs) := Pic(s) islXs) for all gambles g on ^^j. (10) 
It follows from the discussion in Section 3 that, alternatively [see Eq. (4)], 

Pism)=^cesPi,m). (11) 

For easy reference, we will in what follows refer to this collection of global models as the 
family of global models .'^{P), so 

■!^{t) ■= {P} U : seT'^ and non-empty S C C{s)}. 

We end this section by discussing a number of interesting properties for the family of 
global models ^{P} we can derive in this way. Let us call any real functional 3> on 
strictly positive if 3>(I{;jj) > for all x G 

Proposition 3. If all the local models Qs{-\^m(s))> s & T are strictly positive, then so are 

all the global models in -S^iP). 

Proposition 4. Consider any non-empty subset EofT and any xe G ^e- If P{{xe}) > 
then also PidixEnlc} \xe) > Ofor alle gE and all c G C{e). 

Before we formulate the most important result in this section (and arguably, in this 
paper), we provide some motivation. Suppose we have some family of global models 

■3^{Y.) ■= {V} U : seT^ and non-empty S C C{s)} 

associated with the tree. How do we express that such a family is compatible with the 
assessments encoded in the tree? 

First of all, we require that our global models should extend the local models: 
Tl. For each s €T, is the restriction of y|^(-|X„(^)) to ^(^). 

The second requirement is that our models should satisfy the rationahty requirement of 
coherence: 

T2. The (conditional) lower previsions in ^(V) are jointly coherent. 
The third requirement requires more explanation: the global models should reflect all epi- 
stemic irrelevancies encoded in the graphical structure of the tree. Naively, we would want 
condition (2) to be satisfied. The problem is that only the right-hand side in Eq. (2), in- 
volving the model V^si'l^s) is directly available to us. To get to the left-hand side involving 
the model y|5(|X{5}u/), one naive approach would be to 'condition the jointmodelV ^Vj- 
on the variable X| 5} J/'. But we have seen in Section 3.1 that given a joint model, coherence 
in general only determines the conditional models uniquely, provided that the lower prob- 
ability of the conditioning event is non-zero. This is a fairly strong condition, and in what 
follows we would generally prefer to work with the much weaker condition that the upper 
probability of the conditioning event is non-zero. Since in that case the left-hand side 
of Eq. (2) need not be uniquely determined from the joint V_ by coherence, this approach 
becomes unfeasible. 

Nevertheless, as soon as we realise that all we can reasonably require from our models is 
that they should be coherent, the right approach readily suggests itself: we should require 

t^Observe that this holds trivially also if £ n 4.C = 0, because then ^Er\\.c = is a singleton [see footnote 4] 
whose upper probability should be 1 by separate coherence. 

'^As the results in [1 1] suggest, it might be possible to go even further, and prove a counterpart to Theorem 5 
with no positivity restrictions on the local models. We leave this as an avenue for future research, however. 

'''This is also the approach implicit in Definition 1, as well as the one used in [11]. It coincides with the 
usual, naive approach as soon as all the relevant conditional models are uniquely determined from the joint by 
coherence. 
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that if we use the available models V^si' 1^*) define the models yjj(- |X{j}y/) through the 
epistemic irrelevance condition (2), then the result should still be coherent: 

T3. If we define the conditional lower previsions V_|5(-|^{i}u/)> s G T^, S C C{s) and/? C S 
through the epistemic irrelevance requirements 

]Lis{f\z{s}uR) '■= }Usif{-,ZR)\z,) for all gambles / in ^{^isur), 

then all these models together should be (jointly) coherent with all the available models 
in the family ^(V). 

And there is a final requirement, which guarantees that all inferences we make on the basis 
of our global models are as conservative as possible, and are therefore based on no other 
considerations than what is encoded in the tree: 

T4. The models in the family ^(V) are dominated (point- wise) by the corresponding 
models in all other families satisfying requirements T1-T3. 
It turns out that the family of models ^{E) we have been constructing above satisfy all 
these requirements. 

Theorem 5. If all local models Qs{'\^m{s)) on ^{^s), s d T are strictly positive, then 
the family of global models .!^{P), obtained through Eqs. (7)-(10), constitutes the point- 
wise smallest family of (conditional) lower previsions that satisfy T1-T3. It is therefore the 
unique family to also satisfy T4. Finally, consider any non-empty set of nodes E C_ T and 
the corresponding conditional lower prevision derived by applying regular extension:^'' 

R{f\xE) := max{^ gM: > Oj/ora/// e ^(.^Tr) and all xe £ .Te. 

Then the conditional lower prevision R{-\Xe) is (jointly) coherent with the global models 
in the family ^{P). 

The last statement of this theorem guarantees that if we use regular extension to update the 
tree given evidence Xe = xe, i.e., derive conditional models R{-\xe) from the joint model 
P = P^j, such inferences will always be coherent. This is of particular relevance for the 
discussion in Section 6, where we derive an efficient algorithm for updating the tree using 
regular extension. It impUes in particular that our algorithm produces coherent inferences. 

5. Some separation properties 

Without going into too much detail, we would like to point out some of the more striking 
differences between the separation properties in imprecise Markov trees under epistemic 
irrelevance, and the more usual ones that are valid for Bayesian nets [20], which, by the 
way, are also inherited from Bayesian nets by credal nets under strong independence [4]. 

It is clear from the interpretation of the graphical model described in Section 2.4 that 
we have the following simple separation results: 

X,j > X,j > X{ X,-] < > Xi 

where in both cases, Xi^ separates Xt from X,-, : when the value of Xi^ is known, additional 
information about the value of X,-, does not affect beliefs about the value of Xt. In this 
figure, between ii and i^, and between i2 and t, there may be other nodes, but the arrows 
along the path segment through these nodes should all point in the indicated directions. 
The underlying idea is that f is a (descendant of some) child c of (2, and conditional on the 
mother ii of c, the non-parent non-descendant i\ of c is epistemically irrelevant to c and all 
of its descendants. 

On the other hand, and in contradistinction with what we are used to in Bayesian nets, 
we will not generally have separation in the following configuration: 

^'if we look at the proof of this result in the Appendix, it is not hard to see that similar statements can be 
made about the (joint) coherence of the regular extensions R(-\XEy) for any finite collection Ej^,k= of 
sets of nodes. 
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Xi, < Xi,_ < X, 

where X,^ does not necessarily separate Xt from Xi^ . We will come across a simple counter- 
example in Section 7. Where does this difference with the case of Bayesian nets originate? 
It is clear from the reasoning above that Xi^ separates X,| from X,: conditional on Xi^, Xt 
is epistemically irrelevant to Xi^ . For precise probability models, irrelevance generally im- 
pUes symmetrical independence, and therefore this will generally imply that conditional on 
Xj^, X,, is epistemically irrelevant to Xi as well. But for imprecise probability models no 
such symmetry is guaranteed [3], and we therefore cannot infer that, generally speaking, 
will separate X, j fromX^. As a general rule, we can only infer separation if the arrows 
point from the 'separating' variable Xi^ towards the 'target' variable Xf. 



6. A FAST ALGORITHM FOR UPDATING IN AN IMPRECISE MARKOV TREE 

We now consider the case where we are interested in making inferences about the value 
of the variable Xt in some target node t, when we know the values xe of the variables Xe 
in a set £ C r \ {?} of evidence nodes; see for instance Fig 1. 

6.1. The formulation of the problem. If we assume that the values of the remaining 
variables are missing at random, then we can do this by conditioning the joint £ obtained 
above on the available evidence 'Xe =xe'; see for instance [12, 29]. 

We will address this problem by updating the lower prevision P to the lower prevision 

Rf{-\xE) on^(^) using regular extension [24, Appendix J]: 

Ug\xE) = max{Al e M: P{l{,,}[g - pi]) > 0} (12) 

for all gambles g on assuming that P{{xe}) > 0. Theorem 5 guarantees that such infer- 
ences are coherent. Sufficient conditions on the local models for this positivity assumption 
to hold are given in Proposition 3. 
Consider the map 

Pg-. M^R: M^aifelk-MD- 
We can infer from the separate coherence of P that \pgini) —pg{lJ,2)\ < |Mi — I^iIPUxe}) 
for all jUi,jU2 G K, which implies that Pg is (Lipschitz) continuous. Separate coherence of 
P also guarantees that pg is concave and non-increasing. Hence {ji G M: pg{lJ.) > 0} = 
{—°°jE^{g\xE)], which shows that the supremum that we should have a priori used in (12) 
is indeed a maximum. Rj{g\xE) is the right-most zero of Pg, and it is, again by separate 
coherence of P, guaranteed to he between the smallest value ming and the largest value 
max^ of g. If moreover P{{xe}) > 0, then separate coherence of P impUes that ^(^|x£) 
is the unique zero of Pg. If on the other hand P{{xe}) = 0, then {—°°,P,{g\xE)] is the set 
of all zeros of Pg. It appears that any algorithm for calculating Rf{g\xE) will benefit from 
being able to calculate the values of pg, or even more simply check their signs, efficiently. 

6.2. Calculating the values of Pg recursively. We now recall from Section 4 that the 
joint £ can be constructed recursively from leaves to root. The idea we now use is that 
calculating Pg{li) = P{l{xE}[g ^ M]) becomes easier if we graft the structure of the tree 
onto the argument g^ := \xe} {8~ I^]^^ follows. Define 



g-jJ. if s = t 

1 ifsGT\{EU{t}), 



thengs G J^{^s) andg^ = User 8s ■ Also define, for any s £T, the gamble (j)^ on by 
(l>s^ :=n«€4j^«-Then 

(l)^=g^' aiid(|)f >Oifi gr, 
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and 

€=gf n <^c^ for all s€T, (13) 

cGC(.v) 

where we use the convention that any product over an empty set of indices equals one. 
Eq. (13) is the argument counterpart of Eq. (8). Also, if s^t then and do not depend 
on /I, nor on g. Indeed, in that case 



(14) 



First, let us consider the nodes s^t. We define the messages and Ks recursively by 



cec(s) 



X, 



-m{s) 



and Its := 2, g\ 



cec(s) 



X, 



-m(s) 



(15) 



We sunmiarise such a pair by the notation: JTj := 2 {g^ WceC[s) E.c\^m{s)) •= [Rs^'^s)- Then 
there are two possibiUties: 

['Q,{{xs)\X,„^s))Y{Kc{xs) iiseE 

ceC{s) 



■cec{s) 



X. 



ifs^E. 



The messages and are gambles on ^m{sy and can therefore be seen as tuples of real 
numbers, with as many components as there are elements in .^(^). They 

are all non-negative. As their notation suggests, they do not depend on the choice of g or 
jj., but only (at most) on which nodes are instantiated, i.e., belong to E, and on which value 
xe the variable Xe for these instantiated nodes assumes. 

It then follows from Eqs. (8) and (13) and the strong factorisation property^* that [see 
the Appendix for a proof] 



Plsi€ \Xm(s)) = and Pi,{<^^ = Ks. 
Next, we turn to nodes sQt. Define the messages by 

where the gambles y/f on =^ are given by the recursion relations: 

Xj/f :=max{g - n,0}Y[E^ + rDm{g - n,0}Y[7ic, 

cec{t) cec{t) 

and for each □ ^ s □ /, so m(s) exists. 



max{;rf , 0} + ™n{;rf , 0} tTc 

ceS{s) ces{s) 



°m(s)' 



(16) 



(17) 



(18) 



(19) 



The messages are again tuples of real numbers, with one component for each 

of the possible values Xm{s) of ^^(i) . They do depend on the choice of g or , as well as on 
which nodes are instantiated and on which value xe the variable Xe for these instantiated 
nodes assumes. 



1 Q 

This, together with the course of reasoning leading to Eq. (20), shows that the results of updating the tree 
(and the algorithm we are deriving) in this way will be exactly the same /or any way of forming a product of the 
local models for the children of s, provided only that this product is strongly factorising. For instance, replacing 
the conditionally independent natural extension with the strong product in Eq. (7) will lead to exactly the same 
inferences. Of course, this should not be taken to mean that our algorithm also works for updating credal trees 
under strong independence. 

is the root node, then m{s) = and is a single real number, which by Eq. (20) is equal to pg{ix). See 
also footnote 4. 
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It then follows from Eqs. (8) and (13) and the strong factorisation property of the local 
independent products that [see the Appendix for a proof] 

Pui'l>s^\^m{s)) = and of course Pg{n) = 7i^. (20) 

We conclude that we can find the value of Pg(/x) by a backwards recursion method consist- 
ing in passing messages up to the root of the tree, and in transforming them in each node 
using the local uncertainty models; see Eqs. (15) and (17)-(19). 

There is a further simplification, because we are not necessarily interested in the actual 
value of pgin), but rather in its sign. It arises whenever there are instantiated nodes above 
the target node: EnA(t) 7^ 0. Let in that case et be the greatest element of the chain E D 
A{t), i.e., the instantiated node closest to and preceding the target node t, and let St be its 
successor in the chain tf; see for instance Fig. 1. If we let 

:= ^a^{j^^,{xe,),0}]Jn^{xe,)+rmn{n^^{xe,),0}Yl^cM, 
ces{si) ces{s,) 

then it follows from Eq. (19) [with s = and m{s) — e,] that Xj/e, = I^-^^^y}[,g{ii). If we now 
continue to use Eqs. (18) and (19) until we reach the root of the tree, we eventually find 
that' 



20 



(21) 



ia{x£})Ag(At) ifA^(M)>0 

[P{{xE})X,{fi) ifA,(Ai)<0. 

Since we assumed from the outset that > 0, we gather from Eq. (12) that^(^|x£) = 

max{;U e M: Xg{lJ.) > 0}. Moreover, by combining Eqs. (14) and (16) with Proposition 4, 
we find that ndxe,) = Pici{xEnlc}\xe,) > for all c €S{st), and therefore A^(/i) > 
n^,{xe,) > 0. Hence ^(^|x£) = max{jLi G M: 7r,^(x,,) > 0}. 

We conclude that in order to update the tree in the situation described above, we can 
perform all calculations on the sub-tree , where the new root St has local model 2^ ( • ) . 
This is also borne out by the discussion of the separation properties in Section 5. 






: observed node 


• 


: queried or target node t 





: unobserved node 



Figure 1. Example imprecise Markov tree. The target node is f = 10, 

= 2 is the 'greatest' observed ancestor of t and = 3 is the child of 
that precedes t. The bolder arrows represent the trunk T = {3,4, 10} of 
the tree. 

6.3. An algorithm. We now convert these observations into a workable algorithm. 



20 



Actually, we easily derive that Pj(^i) = amax{A5(/i),0} + 6mm{Aj(^i),0}, wfhere a and are real constants 
that do not depend on g and jU. Letting g := ;U ± 1 then allows us to identify the constants a and b. 
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Using regular extension and message passing, we are able to compute Rt{g\xE)'- we 
(i) choose any ^ e [inin^,max^]; (ii) calculate the value of Pig{n) by sending messages 
from the terminal nodes towards the root; and (iii) repeat this in some clever way to find 
the maximal that will make this Xg{iJ.) zero. But we have seen above that this naive 
approach can be sped up by exploiting (a) the separation properties of the tree, and (b) the 
independence of jj. (and g) for some of the messages, namely those associated with nodes 
that do not precede the target node t. 

For a start, as we are only interested in the sign of pg{^l) [or equivalently, that of Ag(/i )], 
which we have seen is determined by the sign of 7lf,{xe, ), we only have to take into consid- 
eration nodes that strictly follow Cf. 

The next thing a smarter implementation of the algorithm can do, is determine the trunk 
T of the tree: those nodes that precede the queried node t and strictly follow the greatest 
observed node et preceding t. We can define the trunk more formally as follows: T := 

n iC{et ) . For the tree in Fig. 1 for instance, where the darker Xio is the queried variable 
and the lighter nodes {2,6,7,8,9, 11, 14, 15} are instantiated, the trunk is given by T = 
{3,4, 10}, and indicated by bolder arrows. 




Figure 2 . Calculation of 114, which is a summary of the /i-independent 
messages in the trunk node 4. 

We have a special interest in the nodes that constitute the trunk, because only they will 
send messages to their mother nodes that actually depend on /i. As a consequence, all other 
nodes (all descendants of the trunk that are not in the trunk themselves) send messages that 
have to be calculated only once. This implies that we can summarise all the -independent 
messages by propagating all of them until they reach the trunk. The -independent mes- 
sages ^ that arrive in a trunk node s can be represented more succinctly by their point- wise 
products := Ilcec{s)\f^' because Eqs. (18) and (19) only depend on them through on 
these products. 

This means that for every trunk node s G T, we have to find the lower (upper) messages 
of every child c of .s- that is not in the trunk itself. Both and TZc can be calculated recurs- 
ively using Eq. (16), where the recursion starts at the leaves and moves up to (but stops 
right before) the trunk. In the leaves, the local lower and upper previsions of the indicator 
of the evidence are sent upwards if the leaf is instantiated; if not the constant 1 is sent up, 
which is equivalent to deleting the node from the tree. We could envisage removing barren 
nodes (all of whose descendants are uninstantiated, such as Xi, X13, Xig in the example tree 
above) from the tree beforehand, but we believe the computational overhead created by the 
search for them will void the gain. 

The only recursion that is still left to do, is the calculation of the ;U-dependent messages 
nf along the trunk. As demonstrated in Fig. 3, we can calculate 7t^,{et) using the following 
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^^4 = a,{max{n^ , 0}n4 + min{;!f , 0}n4 \X„ ) 

^ ^ ^ 



4 =g( max{g-;U,0}n,+min{g-/f,0}n, |Xt) 

V 



Figure 3. Calculation of nf^ixg,), whose sign is the same as that of 



(a,pg(a)) 










\(c,Pg(c)) 



p := Psic)l>-Pg{b)c/p^(c)-Ps{by, 

m:=c; t := p+m/i; 
while m — p>tol and Pg{t) ^ 0.0 

ifpgW>0 

a:= b; b := t; 

M i :=P«(a)fe-Ps(fc)a/pj(a)-Pj(&); 

else 

d := c; c := f; 

i := Pg{c)d-pg{d)c/p^{c)-pg{d); 
p := Psic)b-PgWc/p^{c)-p,ihy, 
. {d,pg{d)) ^ '■= min{m,i'}; t := P+m/i; 



Figure 4. The root of a concave and non-increasing function Pg whose 
values pg{a) > pg{b) > > pg{c) > pg{d) are known, will always be 
in the interval \p,m\ with m := min{g,r}. Here p,q and r are the inter- 
sections with the horizontal axis of the straight lines through {b,Pg{b)) 
and {c,pg{c)), {c,pg{c)) and (d,pg{d)), and {a,pg{a)) and {b,pg{b)), re- 
spectively. The next function evaluation of Pg will be in t which bisects 
the error interval [/?,m]. If Pg{t) > 0, then a becomes b and b becomes 
t, otherwise d becomes c and c becomes t and a new interval \p,m] and 
matching t can be calculated. We stop iterating as soon as the error inter- 
val [p,m] is smaller than a given tolerance tol, or pg{t) is exactly zero. 



recursion formula: 




G, (max{g - ^ , + min{g - ^ , 0}nj ) s = t, 

(max{ < , 0}n, + min{ < , 0}n, ) s G f \ {f } and C(i ) n f = {q } . 



These formulas are reformulations of Eqs. (17)-(19), where the influence of the n has been 
made explicit. 

Since we now know how to calculate 7t!f^{e,), we can tackle the final problem: find the 
maximal jl for which K^^ (e^) = 0. In principle, a secant root- finding method could be used, 
but using the concavity and non-increasing character of 7r^(e;) as a function of ju, we can 
speed up the calculation of the maximal root drastically as shown in Fig. 4. 

Let us briefly discuss the complexity of our algorithm. Consider for a start that for a 
fixed ;U each node makes a single local computation and then propagates the result to its 
mother node: this imphes that, with ^ fixed, the algorithm is hnear in the number of nodes. 
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Iterating on then amounts to multiplying such a linear complexity with the number of it- 
erations. This number depends on the function g, as the iterations are made to compute the 
root of a function that is known to belong to the real interval [min^,max^]. If we assume 
that the bisection algorithm is employed to find the root — for the sake of simplicity — and 
let r := maxg — ming be the range of the function, then the number of iterations is bounded 
by log, 7^ + 1 , where tol is some fixed tolerance. In other words, the number of iterations 
is linear in the number b of bits needed to represent r in base 2. This means that the overall 
complexity of the algorithm is 0{b-\T\), taking into account that the computational com- 
plexity of our root-finding algorithm must be lower than for the bisection (and actually also 
for the secant) algorithm. Since b will be a small number^^ in most cases (e.g. when the 
focus is on probabiUties), we simply refer to the complexity of our algorithm as Unear in 
the number of nodes. 



7. A SIMPLE EXAMPLE INVOLVING DILATION 

We present a very simple example that allows us to (i) follow the inference method 
discussed above in a step-by-step fashion; (ii) see that there are separation properties for 
credal nets under strong independence that fail for credal trees under epistemic irrelevance; 
and (iii) see that in that case we will typically observe dilation. 

Consider the following imprecise Markov chain: 

Xi > X2 > Xs 

V" 

? X2 X3 

To make things as simple as possible, we suppose that = {a,b} and that is a lin- 
ear (or precise, or expectation-like) model Qi with mass function q. We also assume that 
Q^{-\Xi) is a linear model Qii'l^i) with conditional mass function We make no 

such restrictions on the local model Q^{-\X2). We also use the following simpUfying nota- 
tional device: if we have three real numbers K, K and y, we let 

k_{y) := 2£max{7,0} +7cmin{7,0}. 

We observe X2 = X2 and X3 = X3, and want to make inferences about the target variable 
Xi: for any g e jSf(=^i), we want to know Ri{g\x^2,3})- Letting r := :Ri({«}|.^{2,3}) and 
r := /?i({a}|x{2,3}). we infer from the separate coherence of ^i(-|x{2,3}) that it suffices to 
calculate r and 7, because 

Ri{8\xi2,3})^8ib)+l{gia)-gib)}. 

We let g^ = [l^a] ~ I^]^x2}^xi}' and apply the approach of the previous section. We see that 
the trunk T = {l}, and the instantiated leaf node 3 sends up the messages ^ = ({^3 } IX2) 
to the instantiated node 2, which transforms them into the messages 

I2 = 22({^2}|^i)l3(^2) =: q{x2\Xi)q, 

where we let q{x2\Xi) := 22({''''2}|-^i) and q := E.j{x2)- These messages are sent up to 
the (target) root node f = 1, which transforms them into the message = 2i(vf ) with 
= q{x2\Xi)q{l^a^ - n). If we also use that < /i < 1, this leads to 

P-lis^) = K = q{a)q{x2\a)q[l - ^]+q{b)q{x2\b)q[-^], 

so we find after applying regular extension that 

_ q{a)q{x2\a)q 
r-/;i(ia>|X{2,3}) - q^a)q{x2\a)q + q{b)q{x2\b)q 
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It could be argued that b should be bounded given the finiteness of a computer's way to represent numbers. 
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---5({H \ _ q{a)q{x2\a)q 

r «iUa>|X{2,3}J q^a)q{x2\a)q + qib)q{x2\b)q- 

When q — q, which happens for instance if the local model for X3 is precise, then we see 
that, with obvious notations, 

q(a)q(x2\a) . , . ^^^^ 

q{a)q{X2\a) + q{b)q{x2\b) 

and therefore X2 indeed separates X3 fromXi. But in general, letting a := q{a)q{x2\a) and 
J3 := q{b)q{x2\b), we get 

- / I ^ «i5 ^"2 J / I X «i3 q-q 

r — p{a\x2) = — p~ PW^2) — r 



a + paq + Pq a + jiaq + jiq' 

As soon as 5 > ^, X2 no longer separates X3 from Xi, and we witness dilation [14, 22] 
because of the additional observation of X3 ! 

8. Numerical comparison with strong independence 

Strong independence implies epistemic irrelevance, and hence inferred (lower-upper) 
probability intervals for imprecise Markov trees with epistemic irrelevance will include 
those obtained assuming strong independence. This suggests that our algorithm could also 
be used also as a tool to make conservative (also called outer) approximations of the com- 
putations made in a credal tree under strong independence. This could be an important 
apphcation of our algorithm since at the moment it is unclear whether or not updating prob- 
abilities in a tree is a polynomial task under strong independence. If it were not, addressing 
the problem would definitely benefit from the availability of fast approximations. 

With this idea in mind, we make a preliminary empirical exploration of the quality of the 
approximation. As noted in Section 5, the two models have different separation properties: 
this is particularly important when evidence is back-propagated from leaves to root. For this 
reason, we compare posterior probabihty intervals for the root variable of a chain where 
only the leaf node is instantiated. 

Fig. 5 reports the results of this comparison for chains with binary nodes, randomly 
generated local models, and variable length (from 5 up to 100 nodes). The algorithm in 
Section 6 has been used to compute the posterior probabihty intervals in the chains under 
epistemic irrelevance, while the 2U algorithm [13] was used for updating in the chains 
under strong independence. The inferred probability intervals for the former turn out to be 
clearly wider, and the mean difference between the two intervals is about 0.3 irrespective 
of the length of the chain, at least for chains with more than ten nodes. 

For non-binary nodes there are no efficient algorithms known for updating chains with 
strong independence. We used the procedure in [6] to update chains with less than seven 
ternary nodes and credal sets with three randomly generated extreme points in the strong 
independence case. A similar difference between the posterior intervals was observed also 
in these cases. For longer chains, updating for the chain under strong independence is too 
slow and no comparison can be made. In summary, there is a non-negligible difference 
between inferences based on the two notions of 'independence' . This means that the epi- 
stemic approximations to the strong case could be quite crude in practise. However, their 
being outer (that is, safe) approximations together with their light complexity could still 
make of them very useful tools, whenever the strong independence approach is deemed 
necessary or appropriate. 

9. An application 

The tree topology of the graphs considered in this paper is expressive enough to model 
useful and interesting problems. These problems can then be solved efficiently by means 
of the algorithm described in the previous sections. We make this point clearer with an 
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Figure 5 . Numerical evaluation of the difference between the sizes of 
the posterior intervals for inferences on credal chains over binary vari- 
ables with epistemic irrelevance and strong independence. The plot re- 
ports the mean difference (in ordinate) as a function of the number of 
nodes in the chains (in abscissa). Means are estimated over 200 Monte 
Carlo runs. 

example application about character recognition. This is also an opportunity to illustrate 
the differences between the traditional, precise-probability, approach to the problem and 
the imprecise-probability one. Most notably, these differences arise because the imprecise- 
probability methods come with the inherent ability to suspend judgement when the inform- 
ation available is deemed insufficient to rehably recognise a character, whereas the precise- 
probability ones do not. 

9.1. Imprecise hidden Markov models. Hidden Markov models (HMMs) [21] are pop- 
ular tools for modelling a sequence of hidden variables that generate a related sequence 

of observable variables. These are respectively referred to as the generative and observ- 
able sequences. HMMs have applications in many areas of signal processing, and more 
specifically in speech and text recognition. 

Both the generative and the observable sequence are described by sets of variables over 
the same domain iT, denoted respectively by X^j, . . . , and Xo^, Xo„. The independ- 
ence assumptions between these variables, which characterise HMMs, are those corres- 
ponding to the tree structure below. Informally, this topology states that every element of 
the generative sequence depends only on its predecessor, while each observation depends 
only on the corresponding element of the generative sequence. 

generative sequence: X,, > X^^ > • ■ • > Xs^ 

J, J, >i 

observable sequence'. Xq^ Xq^ • ■ • 

A local uncertainty model should be defined for each variable. In the case of precise prob- 
abilistic assessments, this corresponds to linear (precise, or expectation-like) versions of 
the local models 2^ , Q^^ ^{'\^s^) ^^id A: = l,...,rt, where the conditional mod- 

els are assumed to be stationary, i.e., independent of k. These model, respectively, beliefs 
about the first state in the generative sequence, the transitions between adjacent states, and 
the observation process. 

Bayesian techniques for learning from multinomial data are usually employed for identi- 
fying these models. But, especially if only few data are available, other methods leading 
to imprecise assessments, such as the imprecise Dirichlet model (IDM, [25]), might offer 
a more realistic model of the local uncertainty. For example, for the unconditional local 
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model Q , applying the IDM leads to the following simple identification: 



where %J counts the units in the sample for which Xg^ =xu and i is a (positive real) hyper- 
parameter that expresses the degree of caution in the inferences. For the conditional local 
models, we can proceed similarly. This leads to the identification of an imprecise HMM, a 
special credal tree under epistemic irrelevance, like the ones introduced in Section 2. 

Generally speaking, the algorithm described in Section 6 can be used for computing in- 
ferences with such imprecise HMMs. Below, we address the more specific problem of on- 
line recognition, which consists in the identification of the most likely value of given 
the evidence for the whole observational sequence Xo^ = . . . , Xo„ = Xo„. For precise 
local models, this problem requires the computation of the state x^^ := argmax^^^^^f ({Xi„}|xoi, . ■■,Xo„) 
that is most probable after the observation. For imprecise local models different criteria 
can be adopted; see [23J for an overview. We consider maximality: we order the states by 
Xs„ > Zs„ if and only if P(I{ j — 1{ j | j , . . . , Xo„ ) > 0, and welook for the undominated 
or maximal states under this order. Xifiis may produce indeterminate predictions: the set of 
undominated states may have more than one element. 

9.2. On-line character recognition. As a very first application of the imprecise HMM, 
we have considered a character recognition problem.^^ A written text was regarded as a 
generative sequence, while the observable sequence was obtained by artificially corrupting 
the text. This is a model for a not perfectly reliable observation process, such as the output 
of an OCR device. The local models were identified using the IDM, as in (23), by counting 
the occurrences of single characters and the 'transitions' from one character to another 
in the generative sequence, and by matchings between the elements of the two sequences. 
By modelling text as a generative sequence, we obviously ignore any dependence there 
might be between a character and its n-th predecessor, for any n > 2. A better, albeit still 
not completely realistic, model would resort to using n-grams (i.e., clusters of n characters 
with n > 2) instead of monograms. Such models might lead to higher accuracy, but they 
need larger data sets for their quantification, because of the exponentially larger number of 
possible transitions for which probabilities have to be estimated. The figure below depicts 
how on-line recognition through HMM might apply to this setup. 

Original text: VITA 




OCR output: V 

The performance of the precise model can be characterised by its accuracy (the percentage 
of correct predictions) alone. The imprecise HMM requires more indicators. We follow [2] 
in using the following: 

determinacy: percentage of determinate predictions, 

set-accuracy: percentage of indeterminate predictions containing the right state, 
single accuracy: percentage of correct predictions computed considering only de- 
terminate predictions, and 
indeterminate output size: average number of states returned when the prediction 
is indeterminate over number of possible states. 
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For a more involved application, related to aircraft trajectory model tracking, see [1]. 
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Precise HMM 

Accuracy 

Accuracy (if imprecise indeterminate) 



93.96% 
64.97% 



(7275/7743) 
(243/374) 



Imprecise HMM 

Determinacy 
Set-accuracy 
Single accuracy 
Indeterminate output size 



2.97 out of 21 classes 



95.17% 
93.58% 
95.43% 



(7369/7743) 
(350/374) 

(7032/7369) 
(1112/374) 



Table 1 . Precise vs. imprecise HMMs. Test results obtained by twofold 
cross-validation on the first two chants of Dante's Divina Commedia and 
« = 2. Quantification is achieved by IDM with s = 2 and Perks' prior 
modified as suggested in [28, Section 5.2]. The single-character output 
by the precise model is then guaranteed to be included in the set of char- 
acters the imprecise HMM identifies. 



The recognition using our algorithm is fast: it never takes more than one second for 
each character. Table 1 reports descriptive values for a large set (7743) of simulations, and 
a comparison with precise model performance. Imprecise HMMs guarantee quite accur- 
ate predictions. In contrast with the precise model, there are 'indeterminate' instances for 
which they do not output a single state. Yet, this happens rarely, and even then we witness 
a remarkable reduction in the number of undominated states (from the 21 letters of the 
Italian alphabet to less than 3). Interestingly, the instances for which the imprecise probab- 
ility model returns more than one state appear to be 'difficult' for the precise probability 
model: the accuracy of the precise models displays a strong decrease if we focus only on 
these instances, while the imprecise models here display basically the same performance 
as for other instances, by returning about three characters instead of a single one. 



We have defined imprecise-probability (or credal) trees using Walley's notion of epi- 
stemic irrelevance. Credal trees generalise tree-shaped Bayesian nets in two ways: by al- 
lowing the parameters of the tree to be imprecisely specified, and moreover by replacing 
the notion of stochastic independence with that of epistemic irrelevance. Our focusing on 
epistemic irrelevance is the most original aspect of this work, as this notion has received 
Umited attention so far in the context of credal nets. 

We have focused in particular on developing an efficient exact algorithm for updating be- 
liefs on the tree. Like the algorithms developed for precise graphical models, our algorithm 
works in a distributed fashion by passing messages along the tree. It computes lower and 
upper conditional previsions (expectations) with a complexity that is linear in the number 
of nodes in the tree. This is remarkable because until now it was unclear whether an al- 
gorithm with the features described above was at all feasible: in fact, epistemic irrelevance 
is most easily formulated using coherent lower previsions, which have never before been 
used as such in practical applications of credal nets. Moreover, it is at this point not clear 
that epistemic irrelevance is as 'well-behaved' as strong independence is with respect to the 
graphoid axioms for propagation of probability in graphical models [5, 19].^^ Our results 
therefore appear very encouraging, and seem to have the potential to open up new avenues 
of research in credal nets. 



Unlike credal nets based on strong independence, a credal net based on epistemic irrelevance cannot gen- 
erally be seen as equivalent with a set of Bayesian nets with the same graphical structure: if it were, then all 
separation properties of Bayesian nets would simply be inherited, and we have seen in Section 7 that such is not 
the case. 



10. Conclusions 
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On a more theoretical side, we have also shown that our credal trees satisfy the important 
rationality requirement of coherence. This has been established under the assumption that 
the upper probability of any possible observation in the tree is positive, which is a very 
mild requirement. The same assumption also allowed us to show that all inferences made 
by updating the tree will be coherent with each other as well as with the local uncertainty 
models in the nodes of the tree. 

On the applied side, we have presented an application of the credal tree model to the 
problem of character recognition, where the parameters of the model are inferred from 
data. The empirical results are positive, especially because they show that our credal trees 
are able to make more reliable predictions than their precise-probability counterparts. 

Where to go from here? There are many possible avenues for future research. 

It would be very useful to be able to extend the algorithm at least to so-called polytrees, 
which are substantially more expressive graphs than trees are. This could be a difficult task 
to achieve. In fact, updating credal nets based on strong independence is an NP-hard task 
when the graph is more general than a tree [7] . Similar problems might affect the algorithms 
for credal nets based on irrelevance. 

For applications, it would be very important to develop statistical methods specialised 
for credal nets under irrelevance that avoid introducing excessive imprecision in the process 
of inferring probabilities from data. This could be achieved, for instance, by using a single 
global IDM over the variables of the tree rather than many local ones, as we did in our 
experiments. 

Another research direction could be concerned with trying to strengthen the conclusions 
that epistemic trees lead to. There might be cases where our Markov condition based on 
epistemic irrelevance is too weak as a structural assessment. We have discussed situations 
where this type of Markov condition systematically leads to a dilation of uncertainty when 
updating beliefs with observations, and indicated that this dilation is related to the (lack 
of) certain separation properties induced by epistemic irrelevance on a graph. Dilation 
might not be desirable in some applications, and we could be called upon to strengthen the 
model in order to rule out such behaviour. One way to address the issue of dilation — ^but 
not necessarily the easiest — could consist in adding additional irrelevance statements to 
the model, other than those derived from the Markov condition. An easier avenue could 
be based on designing assumptions that together with the Markov condition lead to some 
stronger separation properties, while not necessarily requiring them to match the common 
ones used in Bayesian nets. 
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Appendix A. Proofs of important results 

In this Appendix, we justify formulas (16) and (20), and give proofs for Propositions 3 
and 4, and Theorem 5. 

Proof of Eqs. (16) and (20). Let us define the gambles 
and, with obvious notations. 

Let the chain be given by {h,... ,tr}, where ti := □, tr := t and := m{tk+i) for k = 
l,...,r —I. If we apply the recursion equation (8) in s =ti and take into account the 
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separate coherence and the strong factorisation of the conditionally independent natural 
extension P|,c(fi) (' l-^fi )> we see that 



m 



F,^ = C7S=p,(M)=f«)=g^(<), 
where [provided that h^t and therefore r> I] 



(24) 



cec{ti) 



cec{ti) 



X,, 



cec{ti)\{t2} 



cec{ti)\{t2} 



max{<,0}P;c(ro( W 

■cec(ti)\{t2} 



max{CJ(^ , 0} + min{CI,^ , 0} ]~[ CJ,. 

cec{ti)\{t2} cec{ti)\{t2y 



xJ+niin{<,0}P;c(ri)( n^^". 



Xt 



cec{ti)\{t2} 



A* 

8tr 



8h 



(25) 



Similarly, we find that 



^',=Pi,M\^,,)=g^{vt2\Xn), 

where [provided that ?2 7^ ^ and therefore r > 2] in a completely similar way as above 



(26) 



< ■=PlC(t2M\Xt2) = max{<,0}n^ + min{<,0}n«5 

L cec{t2)\{t3} cec{t2)\{t3y 



St2 



(27) 



We can go on in this way until we come to tr = t: 

< =£|..(Cl^^.-i) =g,(%!'l^^.-i)> (28) 
where, using the separate coherence and the strong factorisation of the conditionally inde- 
pendent natural extension £j,c((r) (' l^'r)' 



ceC{tr 



ceC(tr\ 



= max{g-M,0}P;cw( Y{€ Xt)i+viAn{g- ^,Q}P^c(,r)( Y{€ 

= max{g-n,0}Y[^ + miR{g-n,0}Y[^c- (29) 

cec{t) cec{t) 

Clearly, if we can prove that = n^^ for all s % t, it will follow from the considerations 
above that also (U^ = nf for all s C f , and then the proof is complete. This is what we now 
set out to do. Consider any s%t. Then applying the recursion equation (8) and taking into 
account the separate coherence and the strong factorisation of the conditionally independ- 
ent natural extension P^c(s)i-'\X-s), we see that, provided s is not a terminal node, and with 
obvious notations, 

= P^,{(l>t = (30) 

where 



(31) 



E := PiC(s) («l>f 1^.) = PiC(s) 8^ n 'I'c^ ^-^ = S^^cis) n '^c X, 

\ eGC(s) / \eC(s) 

=8mPicmxs)=8m'^- 

cec{s) cec{s) 
If on the other hand j is a terminal node, then we can use Eq. (9) to find that 



(32) 
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where the last equality follows from Eq. (15). Now combine Eqs. (30)-(32) and use recur- 
sion to complete the proof. □ 

Proof of Proposition 3. Fix any xt in We need to prove that P{{xt}) > and that 
P|s({^4j'}kj) > for all i G 7"^, non-empty S C C{s) and Zs G Our argumentation is 
similar to a special case of the one in Section 6.2. We use the notation estabhshed there, but 
with in particular g := n + 1, t := O and E := T. This impUes that = \xt}' Ss = ^{xs} 
and (f)^ = i{x^^}- In accordance with Eq. (15), we define the messages lis G ■^{^m{s)) 
Xs G ^{^s) recursively by: 

Xs-= n ^c^dns:=Qsi\xs}^s\Xm{s))=^s{xs)Qs{{xs}\Xm{s)), -seT (33) 

cgC(s) 

with, as before by convention Aj := 1 in all leaves s. The last equality follows from the 
separate coherence of the local models 2.v('l'^m(.v)) ^^id the fact that all messages Tf j and Xs 
are non-negative. It is clear from the recursion equations (7) and (8) [see also Eq. (16), the 
proof is completely similar to that of Eqs. (16) and (20) given above] thatP|s({xj^v}|-^m(.s)) = 
Ks, for all s € r, and that P:^c{s) {{xic(s) } \^s) = Xs for all j e . Similarly, it follows from 
Eq. (11), conjugacy and the strong factorisation property of the conditionally independent 
natural extension that Pj^j({x|5}|Xv) = Ylces'^c for all s G and all non-empty S C C{s). 
So we have to prove that all values (all components) of all messages Tf^, j G T are (strictly) 
positive. This follows at once from the recursion equation (33) and the assumed strict pos- 
itivity of the local models ( • \X^(^s) ) • ^ 

Proof of Proposition 4. Our argumentation is similar to a special case of the one in Sec- 
tion 6.2. We also use notation similar to that estabhshed there, with in particular g := n + l 
and t := □. In accordance with Eq. (15), we define the messages Its S ■^(■^(i)) and 
Xs G ^{^s) recursively by: 

Xs- II ^c, s€T (34) 

cec{s) 

and 

_ _ fl,(xs)a({jcs}|^m(i)) ifseE 
" \e,(A,|X,„(,)) if.Gr\£. 

with, as before by convention := 1 in all leaves s. All these messages are non-negative by 
construction. It is clear from the recursion equations (7) and (8) [see also Eq. (16), the proof 
is completely similar to that of Eqs. (16) and (20) given above] thatP\,;{{xEr]is}\X,„(y)) = '^s 
for all s G r. Now it follows from the recursion equations (34) and (35) and the assumption 
PUxe}) = Jf □ > that Xe{xe) > for all e G E. Again applying Eq. (34), we find that 
indeed ndxe) > for all c G C{e). □ 

Our proof of Theorem 5 relies heavily on a very convenient coherence result proved 
by Enrique Miranda [16, Theorem 6], which we relate here to make the paper more self- 
contained. We use the notations established in the context of Section 3. 

Theorem 6. Let Pbe a (separately) coherent lower prevision on and consider 

m disjoint pairs of subsets and 4 ofN, k= l,...,m. Assume that P{{xi^}) > Ofor all 
x/j G <^(,, k= \ ,...,m and use regular extension to define the conditional lower previsions 
R{'\^ik) on ^ [S^o^), for k = l,...,m. Thenthe (conditional) lower previsions P, R{-\X]^), 
R{-\Xi^) are (jointly) coherent. 

Proof of Theorem 5. We begin by showing that the family of models ^{P) satisfies re- 
quirements T1-T3. 
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To prove Tl, consider any s G and any / S ^(^), then it follows from separate 
coherence that Pic{s)if\^s) = /> and therefore we infer from the recursion equation (8) 

that indeed = £.(£4c(.v)(/l^.v) l^™(.v)) = e,(/I^M(,v))- 

Next, we turn to the proof of T2 and T3. Consider any s GT'^^,S C C{s) and RCS. Let 
^{s}uR G ^{s}yjR and / e ■^{^is\j{s)\jr)- We calculate the following regular extension of 
the joint: 

R{f\Hs}uR) = max{M e M : P{l{x^,^^^} [/ - M]) > 0}. (36) 

Consider that 

where g := f{-,Xs,XR) G .if (,^5). Let f2 be the unique child of ti := □ such that s € lt2 
[assuming of course that s^ti = □]. By using separate coherence, recursion equations (7), 
(8) and (10), and the strong factorisation property [see Proposition 1] of the conditionally 
independent natural extension, in a way similar to the argumentation in Section 6.2, we see 
that 

m^wu.} [/ - ^] ) = e,, (^c(, ) }£ic(,o } - i^'i ^ 

where h2 '■= ^xg^^^^,^} ^ 0. Similarly, let be the unique child of t2 such that s &it$ [assum- 
ing of course that s^t2]. Then we see in the same way as above that 

}[g-fi]\X,,)\Xr,), 

where := I{;c^i^j^,^ } > 0. Continuing in this way, we eventually come to the conclusion 
that 

Pihw} [/ - M]) = micis) {hlM [8 - M] I^.)), (37) 

where h := ^{xg^c{s)} ~ ""fenucwxis)}' where the real functional G on .if (^) is essen- 
tially constructed as follows. Consider the segment fif2 • • -ti—itr connecting □ and s, i.e. '■ 
= s,tr-i ■=m{tr), ...Jk ■=tn{tk+i), . . '■=m{t2) = □. Then there are non-negative /ia; on 
JT^, such that for all/eif(^,), G(/) =/i, where/, :=/aiidA ■.= P^^^^{hk+xfk+i\X„), 
k= 1, ... ,r — 1. [If r = 1, or in other words s = t\ = just let G := £|g.] In other words, 
the functional G results from recursively multiplying with non-negative maps and applying 
global conditional lower previsions. As such, G is non-negatively homogeneous and super- 
additive [because the successive multiplication and composition preserves these proper- 
ties]. In addition, it does not depend on g nor jj.. If we use the separate coherence of 
Plds) i'l^s), the strong factorisation, associativity and marginahsation properties of the con- 
ditionally independent natural extension £|c(j)(-|Xv) [see Proposition 1, Eqs. (4) and (5), 
and the recursion equations (7) and (10)], and the separate coherence of the conditional 
lower prevision we get: 

tlCis) ih^{x,} [8 - M] I^^) = \x.}P^C(s) {h{g - \l\ \Xs) 

[P^c{s){h\x,)\PAs{8Vs)-l^\ ifPisi8M>^i 

^"'^\PiC{s){h\Xs)[Pls{8\Xs)-^l] if Pls{g\Xs)<^l. 

Combining Eqs. (37) and (38), and invoking the non-negative homogeneity of the real 
functional G, this leads to: 

,f_ ^ im{x,,})Pic{s){hM[Pis{g\xs) - ^l] i£Pis{g\xs) > H 

-yUis^uRW ^l) \G{^,^y)P^c(s)ih\Xs)\P^{g\Xs)-H] ifP^si8\Xs)<H, 
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m^Mu.}[/-Ai]) 



where we let G{l^^^j) := —G{—\xs})- letting / = ± 1 [and therefore also ^ = /i ± 1] 
in this expression, we derive in particular that 

P{{Hs}UR}) = G{l{,,})Plc{s){h\Xs) andP{{X[,yuR}) = G{I{,^})Pic{s){h\Xs), 

and therefore also 

[Pi{x{s}uR})[Pisi8\xs)-l^] ifPisisk.) < H- 

Since we have assumed that all local models 2.s( l'^m(s)) are strictly positive, we gather 
from Proposition 3 that P{{x{sjijR}) > 0, and therefore 

£(i^.}u«} [/ - M]) > ^ PisisM > ^l■ 

This allows us to infer from Eq. (36) that 

Uf\x{s}uR) = P4sifi-^Xs,XR)\xs) for all / e ^{^isu{s}ur) and X{,}ufi € -^{sjuR- (39) 
If we now combine Eq. (39) with Theorem 6, we see that both T2 and T3 hold. 

To complete the proof, we consider T4. Consider any family of models =^(V) that sat- 
isfies conditions T1-T3. Then we want to show that 

LisiW > Pisi-l^t) for all t G and all non-empty S C C{s) (40) 

and 

V>P. (41) 
The proof proceeds in a recursive (inductive) fashion. Since the yjj(-|X^(,)) satisfy Tl, we 
infer in particular that 

Y^t (• l^m(0 )=Piti- \Xm{t) ) = G( (• \Xm{t) ) for all terminal nodes t. 
It is therefore clearly sufficient to prove the following statement for all non-terminal nodes 

l^r) > Pisi-\X,) for all non-empty S C C{t) 

}U>i-\^m(t))>Pui-\Xm(t))- 

(42) 

This is what we now set out to do. Fix any non-terminal node t GT^ and any non-empty 
S C C{t), and assume \ha.tV^{-\Xt) > Pici'l^t) for all c e C{t). 

First of all, define for any disjoint proper subsets / and O of S, the conditional lower 
previsions y;o(-|X{,}u4/) through: 

y4o{f\x{t}uli) = Vio{f{-^xir)\x,) for all / G ^{^louli) all X[,}uii G ^{t}uii- 

Then we infer from T3 [with S = O, s = t and R = il Q O] that all these conditional lower 
previsions are in particular (j ointly) coherent with the conditional lower prevision V^^ ( • 1^; ) • 
If we recall Definition 3 [with = {|c : c G 5} and Y = Xf], we conclude that Y^c(t) (' \^t) 
is a conditionally independent product of the 'marginals' V^^{-\Xt), c G 5, which therefore 
dominates the smallest such independent product: 

Visi-\Xr)>(^ces}Uci-\X,) 

and therefore, using the assumption, we infer from this inequality that 

Visi-l^t) > ^cesV^cm) > ^cesPici-\Xt)^P^m), (43) 

where we have also used, successively, the monotonicity property of the conditionally in- 
dependent natural extension [see [11] for a proof] and the recursion equations (7) and (1 1). 
Next, define the conditional lower prevision V^c{t) ( ' \^{m{t) ,t} )on^{ ^{m(t)}uu ) through: 

KiC(0 if\X{m{t),t}) '■= y-lC(t) {f{Xm{t) ,Xt, ■)\xt) 

for all / G ^{^{m{t)}uit) and all G ^{m{t),t} ■ (44) 



{ycecmv^,m>p^m): 
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Then we infer from T3 [with s = t, S = C{t) and R = {m{t)} C C(f)] that this condi- 
tional lower prevision Vj£'(,)(-|^{m(/),f}) is in particular (jointly) coherent with the con- 
ditional lower prevision V^(-|X„(f)) defined on ■^{^{m{t)}uit)- We then see that for all 

= Ki. (ViC(r) (8 \Xt)\X^it) ) = g (View i8\Xt)\X^(,)) 

>g(£iCw(^I^OI^mW) 
= £4t{-\Xm{t))- 

The first equality follows from Eq. (44), the second one holds because the global models 
Kjj('l^m(o) satisfy Tl, and the third one follows from recursion equation (8). The first in- 
equality follows if we apply Walley's Marginal Extension Theorem^^ [24, Theorem 6.7.2] 
in the formulation of [17, Theorem 4]. The second inequality follows from the inequal- 
ity (43) and the non-decreasing character of which follows from separate co- 
herence. This completes our proof that T4 is also satisfied. 

The last part of the proof follows at once from Eqs. (39) [with R = 0], and Theorem 6. 

□ 
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^ecaU that this is a coherence result that generahses the so-called Law of Iterated Expectations to coherent 
lower previsions. 



